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Abstract. In this article we investigate the possibilities of accelerating the double 
smoothing technique when solving unconstrained nondifferentiable convex optimization 
problems. This approach relies on the regularization in two steps of the Fenchel dual 
problem associated to the problem to be solved into an optimization problem having 
a differentiate strongly convex objective function with Lipschitz continuous gradient. 
The doubly regularized dual problem is then solved via a fast gradient method. The 
aim of this paper is to show how do the properties of the functions in the objective of 
the primal problem influence the implementation of the double smoothing approach and 
its rate of convergence. The theoretical results are applied to linear inverse problems 
by making use of different regularization functionals. 
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1 Introduction 

In this paper we are developing an efficient algorithm based on the double smoothing 
approach for solving unconstrained nondifferentiable optimization problems of the type 

(P) w£{f(x) + g(Ax)}, (1) 

where H. is a Hilbert space, / : T~L — > K and g : M m —> M. are proper, convex and lower 
semicontinuous functions and A : % — > M m is a linear continuous operator fulfilling 
the feasibility condition A(dom /) n dom g ^ 0. The double smoothing technique for 
solving this class of optimization problems (see [8j for a fully finite-dimensional spaces 
version of it) assumes to efficiently solve the corresponding Fenchel dual problems and 
then to recover via an approximately optimal solution of the latter an approximately 
optimal solution of the primal. This technique, which represents a generalization of 



the approach developed in [10 , 1 1 for a special class of convex constrained optimization 



problems, makes use of the structure of the Fenchel dual and relies on the regularization 
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of the latter in two steps into an optimization problem having a differentiable strongly 
convex objective function with Lipschitz continuous gradient. The regularized dual is 
then solved by a fast gradient method which gives rise to a sequence of dual variables that 
solve the non-regularized dual problem after O In (^jj iterations, whenever / and g 
have bounded effective domains. In addition, the norm of the gradient of the regularized 
dual objective decreases by the same rate of convergence, a fact which is crucial in view 
of reconstructing an approximately optimal solution to (P) after O In iterations 
(see (§]). The first aim of this paper is to show that, whenever g is a strongly convex 
function, one can obtain the same convergence rate, even without imposing boundedness 
for its effective domain. Further we show that if, additionally, / is strongly convex or g 
is everywhere differentiable with a Lipschitz continuous gradient, then the convergence 
rate becomes O In (7)) ; while, if these supplementary assumptions are simultaneous 

fulfilled, then a convergence rate of O (In ( -) ) can be guaranteed. 



The structure of the paper is the following. The forthcoming section is dedicated to 
some preliminaries on convex analysis and Fenchel duality. In Section [3] we employ the 
smoothing technique introduced in [13-15 in order to make the objective of the Fenchel 



dual problem of (P) to be strongly convex and differentiable with Lipschitz continuous 
gradient. In Section [4] we first solve the regularized dual problem via an efficient fast 
gradient method. Then we show how do the properties of the functions in the objective 
of (P) influence the implementation of the double smoothing approach and improve its 
rate of convergence. We also prove how an approximately optimal primal solution can 
be recovered from a dual iterate. Finally, in Section [5j we consider an application of 
the presented approach in image deblurring and solve to this end by a linear inverse 
problem by using two different regularization functionals. 



2 Preliminaries on convex analysis and Fenchel duality 

Throughout this paper (•, •) and ||-|| = \J (•, •) denote the inner product and, respectively, 
the norm of the Hilbert space H, which is allowed to be infinite dimensional. The 
closure of a set C C % is denoted by cl(C), while its indicator function is the function 
5c '■ H — > M := M U {±00} defined by 5c(x) = for x £ C and Sc(x) = +00, otherwise. 
For a function / : % — > M we denote by dom/ := {x £ % : f{x) < +00} its effective 
domain. We call / proper if dom / 7^ and f(x) > —00 for all x £ H. The conjugate 
function of / is /* : % -> 1, f*(p) = sup{{p,x) - f{x) : x £ H} for all p £ H. The 
biconjugate function of / is /** : H — > M, f**(x) = sup{(x,p) — f*(p) : p £ H} and, 
when / is proper, convex and lower semi continuous, then, according to the Fenchel- 
Moreau Theorem, one has / = /**. The (convex) subdifferential of the function / at 
x £ U is the set df(x) = {p £ U : f{y) - f(x) > (p,y- x) Vy £ H}, if f(x) £ R, and 
is taken to be the empty set, otherwise. 

Further, we consider the space M. m endowed with the Euclidean inner product and 
norm, for which we use the same notations as for the Hilbert space T~L, since no confusion 
can arise. By l m we denote the vector in M. m with all entries equal to 1. For a subset C 
of W 71 we denote by ri(C) its relative interior, i.e. the interior of the set C relative to its 
affine hull. For a linear continuous operator A : T~L — >• IR m the operator A* : M m —> *H, 
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defined by (A*y,x) = (y,Ax) for all x G H and all y G M m , is its so-called adjoint 
operator. By id : M m — > IR m ,id(x) = x, for all x G M m we denote the identity mapping 
on M m . 

For a nonempty, convex and closed set C C % we consider the projection op- 
erator Vc '■ T~L — > C defined as x i-)- argmin 2gC - ||x — Having two functions 
/, g : 7i — > K, their infimal convolution is defined by /□<? : % — ► M, (/□<?) (a;) = 
infy g -^ {/(y) + <?(x — y)} for all x G 7~L. The Moreau envelope " ( f : % — >■ M of the 
function / : % — > K of parameter 7 > is defined as the infimal convolution 

Vix) := /□ ||-|| 2 ) (*) = mf {/(y) + ±\\x - y\\ 2 ) Vx G U. 

For p > we say that the function / : H — > E is p- strongly convex, if for all x,y £ H 
and all A G (0, 1) it holds 

/(Ax + (1 - A)y) < A/(x) + (1 - A)/(y) - |a(1 - A)||x - y|| 2 . 

Notice that this is equivalent to saying that x 1— >■ /(x) — 5||x|| 2 is convex. 

For the optimization problem (P) we consider the following standing assumptions: 
f : % — > R is a proper, convex and lower semicontinuous function with a bounded effec- 
tive domain, 5 : M m — > M. is proper, ju-strongly convex (/i > 0) and lower semicontinuous 
function and A : % — > W 71 is a linear operator fulfilling j4(dom /) n dom g 7^ 0. 

Remark 1. Different to the investigations made in ||8 in a fully finite-dimensional 
setting, we strengthen here the convexity assumptions on g (there g was asked to be 
only proper, convex and lower semicontinuous), but allow in counterpart dom g to be 
unbounded. 



The Fenchel dual problem to (P) (see, for instance, |5j[6]) reads 

(D) sup {-f*(A*p)-g*(-p)}. (2) 



We denote the optimal objective values of the optimization problems (P) and (D) by 
v(P) and v(D), respectively. 

The conjugate functions of / and g can be written as 



f*(q)= sup {(q,x) - /(x)} = - inf {(-<?, x) + f(x)} Vg G U 

zSdom/ xedom/ 



and 



g*(p)= sup {(p,x) - #(x)} = - inf {{-p,x) + g(x)} Vp £R m , 

xedomg zedomg 

respectively. According to [TJ Theorem 11.9] and |4j Lemma 2.33], the optimization 
problems arising in the formulation of both f*(q) for all q G % and g*(p) for all p G R m 
are solvable, fact which implies that dom/* = % and dom g* = W m , respectively. 

By writing the dual problem (D) equivalently as the infimum optimization problem 

M{r(A*p)+g*(-p)}, 
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one can easily see that the Fenchel dual problem of the latter is 



S up{-r(x)-g**(Ax)}, 
xeH 

which, by the Fenchel-Moreau Theorem, is nothing else than 

sup{-/(x) -g(Ax)}. 

x&i 

In order to guarantee strong duality for this primal-dual pair it is sufficient to ensure that 
(see, for instance, |5l Theorem 2.1]) 6 ri(^4*(dom<7*)-|-dom/*). As /* has full domain, 
this regularity condition is automatically fulfilled, which means that v{D) = v{P) and 
the primal optimization problem (P) has an optimal solution. Due to the fact that / 
and g are proper and ^4(dom/) n domg ^ 0, this further implies v{D) = v{P) G M. 
Later we will assume that the dual problem {D) has an optimal solution, too, and that 
an upper bound of its norm is known. 

Denote by 9 : M m -> M, 9{p) = f*(A*p) + g*(-p), the objective function of (D). 
Hence, the dual can be equivalently written as 

(D) - inf 6{p). (3) 
The assumptions made on g yields that p i— > g*{—p) is differentiable and has a Lipschitz 



continuous gradient (see Subsection 3.1 for details). However, since in general one can 
not guarantee the smoothness of p i— >■ f*(A*p), the dual problem (D) is a nondifferen- 
tiable convex optimization problem. Our goal is to solve this problem efficiently and 
to obtain from here an optimal solution to (P). As in pj, we are overcoming the non- 
satisfactory complexity of subgradient-schemes, i.e. O ( \ ) , by making use of smooth- 



ing techniques introduced in [13-15 . More precisely, we regularize first the objective 
function of f*(A*p) by a quadratic term in order to obtain a smooth approximation 
of p i y f*(A*p). Then we apply a second regularization to the new dual objective 
and minimize the regularized problem via an appropriate fast gradient scheme (see (§] ) . 
This will allow us to solve both optimization problems (D) and (P) approximately in 
O ^ In (7 J J iterations. More than that, we will show that this rate of convergence can 
be improved when strengthening the assumptions imposed on / and g. 



3 The double smoothing approach 
3.1 First smoothing 

For a real number p > the function p \-t f*(A*p) = sup xg -^ {(A*p, x) — f(x)} can be 
approximated by 

f;(A*p) = sup \(A*p,x) - f{x) - P - \\xf\. (4) 

For each p € R m the maximization problem which occurs in the formulation of f*{A*p) 
has a unique solution (see, for instance, (Tl Proposition 11.14]), fact which implies that 
f* p {A*p) G M. 
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For all p E M m one can express the above regularization of the conjugate by means 
the Moreau envelope of / as follows 



-f* p (A*p) = - sup \{A*p,x)-f(x) 



inf 



P II II 2 

2 IN 
p 



(A*p,x) + f(x) + f - \\x\ 



inf f{x) + 



P 



A*p 



\A*p\\ 
~^P~ 



i 



A*p\ \\A*p\ 



2p 



Consequently, one can transfer the differentiability properties of the Moreau envelope 
(see jlj Proposition 12.29]) to p H> — (/* o A*)(p). For all p £ R m we have 



p V p 



-v(/;oi*)(p) 

thus 



AA*p A 



P 



P 



A*p 



AA*p 



-Ax f . 



V(f* p oA*)(p) = Ax f>p , 

where xj jP G % is the proximal point of parameter - of / at namely the unique 
element in H fulfilling (see [l] Proposition 12.29]) 



i 



A*p 



f( x f,p) + 



A*p 2 

x f>p 

p 



By taking into account the nonexpansiveness of the proximal point mapping (see [l] 
Proposition 12.27]), for p, q G R m it holds 



V(/! o A*)(p) - V(/ p * o A*)(q) = \\Ax fiP - Ax f J < \\A\\ \\x f>p - x fjl 



A*p A* 



thus 



\\A\\ 2 



< \\A\\ 



is the Lipschitz constant of p i— > V(/* o A*)(p). 



< 



\A\ 



\\P 



p ■ ••■/' 

Coming now to the function p i— >• g*{—p) = (g* o — id)(p), let us notice first that, 

since g is proper, /i-strongly convex and lower semicontinous, g* is differentiable and 

V<7* is Lipschitz continuous with Lipschitz constant ^. Thus (g* o — id) is Frechet 

differentiable, too, and its gradient is Lipschitz continuous with Lipschitz constant —. 

By denoting 

x g ,p := Vg*{-p) = -V{g* o-id)(p), 

one has that — p £ dg(x giP ) or, equivalently, G d((p,-) + g)(x 9lP ), which means that 
is the unique optimal solution (see 141 Lemma 2.33]) of the optimization problem 
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Remark 2. If / is p-strongly convex, for p > 0, then there is no need to apply the first 
regularization for p i— >• f*(A*p), as this function is already Frechet differential) le with a 

IUII 2 

Lipschitz continuous gradient having a Lipschitz constant given by 11 — . Indeed, the p- 
strong convexity of / implies that /* is Frechet differentiable with Lipschitz continuous 
gradient having a Lipschitz constant given by ~ (see JTJ Theorem 18.15]). Hence, for all 
p,q G M m , we have 



\\V(f* o A*)(p) - V(f o A*)(q)\\ = \\AVf*(A*p) - AVf*(A*q)\\ 

\\A\\ \\A\\ 2 

< \\A*p - A*q\\ < — — \\p - q\\ . 

P P 

By denoting 

x Lp :=Vf*(A*p), 

one has that G d(f — (A*p, which means that Xf jP is the unique optimal 
solution (see (4| Lemma 2.33]) of the optimization problem 

inf{f(x)-(A*p,x)}. 

By denoting Df := sup j : x G dom / j G K we can relate /* o A* and its smooth 
approximation /* o A* as follows. 

Proposition 3. For all p G M m /io/(is 

< f(A* P ) < r p {A*p)+ P D } . 

Proof. For p G R m one has 

/;(A*p) = <^p,x /iP > - /(x /tP ) - | ||a; /iP || 2 < <A*p,ar /iP ) - /(x /iP ) < /*(A*p) 
< sup \ (A*p,x) - f(x) - ~ \\x\\ 2 \ + sup <^||x|| 2 l 

= /;(A*p) + p J D / . 

□ 

For p > let 9 p : E m M be defined by 6 p {p) = f*{A*p) + g*(-p). The function 
Op is differentiable with a Lipschitz continuous gradient 

V9 p (p) = V(/; o A*)(p) + V(g* o - id)(p) = Axfj, - x g , p Vp G M"\ 

having as Lipschitz constant L(p) := — h ^. 
In consideration of Proposition [3] we get 

Opip) < 0(p) < e p (p) + pD f VpGM m . (5) 

In order to reconstruct an approximately optimal solution to the primal optimization 
problem (P) it is not sufficient to ensure the convergence of #(•) to —v(D), but we also 
need good convergence properties for the decrease of ||V6> p (-)|| (cf. (8 10 
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3.2 Second smoothing 

In the following, a second regularization is applied to 8 p , as done in [8 10 11 , in order 
to make it strongly convex, fact which will allow us to use a fast gradient scheme with 
a good convergence rate for the decrease of ||V#p(-)||. Therefore, adding the strongly 
convex function | ||-|| 2 to 8 P , for some positive real number k, gives rise to the following 
regularization of the objective function 

9 P , K : R m -> R, e p>K (p) := 6 p (p) + \ \\pf = f p {A*p) + g*(-p) + | ||p|| 2 , 

which is obviously K-strongly convex. We further deal with the optimization problem 

2&A"(P)- ( 6 ) 



By taking into account [4j Lemma 2.33], the optimization problem ([6| has a unique 
optimal solution, while the function 8 PtK is differentiable and for all p G M. m it holds 

VVCp) = V \0 P (-) + | || -|| 2 ) (p) = Ax f , p - x g , p + up. 

This gradient is Lipschitz continuous with constant L(p, k) := ^| — h i + k. 

Remark 4. If # p is K-strongly convex, then there is no need to apply the second 
regularization, as this function is already endowed with the properties of 8 PtK . 



4 Solving the doubly regularized dual problem 

4.1 A fast gradient method 

In the forthcoming sections we denote by p* DS the unique optimal solution of the opti- 
mization problem ^ and by 9* := 9 PjK (p* DS ) ^ s optimal objective value. Further, we 
denote by p* G M m an optimal solution to the dual optimization problem (D) and we 
assume that the upper bound 

\\P*\\<R (7) 

is available for some nonzero R G M+. 

Furthermore, we make use of the following fast gradient method (see (l2| Algorithm 
2.2.11]) 



Init.: Set wq = po := G 



run 



For k > : Set := w fc - — V6 p>li (w k ). (8) 



Set :=pfc+i + . , ^(Pfe+i -Pk) 

y/L(p, K) + y/K 

for minimizing the optimization problem ([6]), which has a strongly convex and differ- 
entiable optimization function with a Lipschitz continuous gradient. By taking into 



7 



account (l2| Theorem 2.2.3] we obtain a sequence (pk)k>o Q ^ m satisfying 

J' 



< (e Pl «(Pd) - ^ + i IlPo -PD5II 2 )e (9) 



< 



2(^,«(po) " C J e Vk > 0, (10) 



while the last inequality is a consequence of [12] Theorem 2.1.8]. Since p* DS solves 
we have ^9 p ^ K {p* DS ) = and therefore |12[ Theorem 2.1.5] yields 



|V^, K ( Pfc )|| 2 < Vfe) - 0* V 2(0 ft „(po) - 0* )e~*V^ 



which implies 

\\ve PjK ( Pk )\\ 2 < 4l( p ,k)(6 p , k ( Po ) - e;^)e~ k ^^ vk > o. (11) 

Due to the K-strong convexity of 9 p>K , |12[ Theorem 2.1.8] states 

\ lift - Phsf < G P ,M ~ 0; >K ? 2(0 p , re (Po) - 9* PiK )e- k V^5 Vfc > 0. (12) 
Using this inequality it follows that (see also [To|[TT] ) 

\\ Pk - p* DS \\ 2 < min | ||po - Pcsf , ^(0 P M ~ 0* P ,K)e- k V^ } Vfc > 0. (13) 

We first prove that the rates of convergence for the decrease of 9(pk) — 0{p*) and 
II V(9p(pfc) || coincide, being equal to O ^ In fg an d that they can be improved when 
/ and/or g fulfill additional assumptions. We also show how e-optimal solutions to the 
primal problem (P) can be recovered from the sequence of dual variables (pk)k>o- 

4.2 Convergence of 9(pk) to 9(p*) 

Since the algorithm starts with po = 0, we have P , K (O) = f p (0) +g*(0) + § ||0|| 2 = 9 p (0), 
while 

0pAp*ds) = Wds) + *\\p*ds\\ 2 - (14) 
Making use of these two relations we obtain 

\ \\p* D st ? e p>K {o) - 9 p Ap*ds) = 0,(0) - o P (Phs) - 2 fell 2 - 

which further implies that 

Wphsf^lW)- p (p* DS )). (15) 
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Additionally, in all iterations k > 0, we have 



\\Pk-PDsV ^ 2 ( pAPk) ~ 8 p ,k(p*ds)) 

Ki 



I (WO) - 9 p , K (p* DS ) + f ||0 - P^f) e" fe 
^ £(*,(0) e"Vi^ (16) 

and 

P (Pk) ~ p (p* DS ) I (VW - ^tes) + f 11° - Pz>sf ) e~ 

+ f (ifef-IHI 2 ) 

® (^(0)- Ws ))e- fe V« + | (llp^f - || Pfc f) . (17) 

The estimation 

\\phsf ~ \\Pkf = (\\P*DS\\ ~ \\Pk\\) (\\P*DS\\ + \\Pk\\) 



< \\Phs-Pk\\(\\p*Ds\\ + 

< \\p* DS -PkH2\\p* DS \\ + \\p k -p* DS \\) 



HPhs-PkWWphsW 

? 3 \\p* DS \\ \/l W) - Wds)) e" 1 ^^ 
9 — (e p (0)-fp(p^))e-^V^i 



can now be inserted into (17) and this leads to 



M ~ 6 P (P*DS) < W) ~ OpiPhs)) (e + -| e ^ 

25 _i£ / k 

< g- (0,(0) - P (p£s))e ^ v L iP' K ) y k > 0. (18) 



Further, we have 0,(0) < 0(0), 6» p (^ 5 ) > 9{p* DS )-pD f > 9{p*)-pD f and, from here, 

0,(0) - OpiPhs) < 9(0) - 9(p*) + pD f . (19) 
Since 6» p (p^ 5 ) < 9 p (p* DS ) + § Ifell 2 < 6 p (p*) + § ||p*|| 2 , we obtain that 



and, therefore, 



9 P (Pk) - 9 p {p* DS ) Z 9{p k ) - P D f - 9{p*) - - \\p*f Vk > 0. (20) 



9 P (phs)<W) + 2 \\p*\r^o(pl + 2 \\p* 
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In conclusion we obtain for all k > 

9(p h ) - e{ P *) ? p d s + 1 bl 2 + W ~ Wds) 



pDf + -R 2 + - (0,(0) - e p (p* DS ))e-*V^) 
W „ k„ 9 25 „ , _i 



•it fc / K 

pD f +'iR 2 + ^r {0(0) - 0(p*) + pD f ) e'W^M. (21) 
2 o 

Next we fix e > 0. In order to get 9(pk) — 0{p*) < e after a certain amount of iterations 
k, we force all three terms in ( |21[ ) to be less than or equal to |. To this end we choose 
first 

9 := P ^ = 3ZT and K := ^ = 3^2' ^ 



With these new parameters we can simplify (21 ) to 



0{Pk) - Qif) <j + y \ m ~ 0(p ^ + 3 J e 5 V MPiK) VA: - °' 

thus, the second term in the expression on the right-hand side of the above estimate 
determines the number of iterations needed to obtain e-accuracy for the dual objective 
function 9. Indeed, we have 



| > f (e(0) - 9{p*) + e~^ 



* A~7^T 3 25 / e 



^e^V Km) > - • — 0(0) - 0(p*) + 
e o \ 6 

k / ~^ ^ f z 5 (g(o)-g(p*)+f) ' 



^ fc >2J^m( 75 ^-^ + t)), ( 23 ) 



Noticing that 



L(o,k) \\A\\ 2 1 pi 9||A|| 2 Z) fj R 2 3i? 2 

— ^ — - = - — - — I hi - ' 1 1 hi 

k pK pn 2e l 2pe 

1 (9\\A\\ 2 D f R 2 3R 2 e 2 N 



e 2 y 2 " 2a* 

in order to obtain an approximately optimal solution to (D), we need k = O In 



iterations. 



4.3 Convergence of ||V0 p (pfc)|| to 

Guaranteeing e-optimality for the objective value of the dual is not sufficient for solving 
the primal optimization problem with a good convergence rate, as we need at least the 
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same convergence rate for the decrease of ||V0p(pfc)|| = — £c Sl p fe || to 0. Within 

this section we show that this desiderate is attained (see also 10 II]). Since 



\\Pk\\ = \\Pk ~ P*DS + P*DS\\ ^ \\Pk-P*Ds\\ + \\P*Ds\\ ? 2 \\P*Dsh 
we conclude that 



|v^(p fe )|| < \\ve P!K (p k )\\ + \\ K p k \\ 

< ||Ve p ,«(pjfc)|| +2 K \\p* DS \\ Vk > 0. 



(24) 



We further have 



\Ve p>K (p k )f^ 4L(p,K)(e p , K (0)- 9 p , K (p* DS ))e 



k \/ L(p, K ) 



4L(p, K )(e p (0)-9 p (p* DS ))e 
4L{p, k) f 0(0) - 9{p*) + I) e"Vw 



which yields 



\ve p , K ( Pk )\\<2JL(p,K)le(Q)-9(p*) + 



2 a/ i(p,K 



) Vfc > 0. 



(25) 



In order to give an upper bound for the second term in ( 24 ) , we notice that 



9{p*) + 2 Hp* 



P ip*) + ^\\P*\V>G P iPDs) + ^\\PDs\ 
0(phs)-pD f + *\\phs\\ 2 



> e{p*)- P D }+2 \\p* DS \Y 



which is equivalent to | \\p* DS \\ < f lb*|| + P-D/j i- e. ||p£>sl| < ||p*|| + Hence, 

fen < \^f^/ ^ ^r+£ ® vtn 2 ^ 2 5 ^ (26) 



3k 



which, combined with (24) and (25), provides 



\V9p(Pk)\\ <2Jl( p ,k) (e(0) - 9(p*) + e t/^ + 2^2, 



!(/>. k) ( 6(0) - 9(p*) + |) + ^ 



VA; > 0. (27) 



For e > fixed, the first term in (27) decreases by the iteration counter k, and, by 

2f 



taking into account (22), we can ensure 

0{Pk) ~ 0(p*) < e and \\V9 p (p k )\\ < 



R 



(28) 



in k = O A In 



1 1„ / 1 



iterations. 
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4.4 Improved convergence rates 



In this subsection we investigate how additionally assumptions on the functions / and/or 
g influence the implementation of the double smoothing approach and its rate of con- 
vergence. 



4.4.1 The case / is strongly convex 

Assuming additionally to the standing assumptions that the function / : T~L — > R is p- 
strongly convex, for p > 0, the first smoothing, as done in Subsection 3.1 can be omitted 
and the fast gradient method (|8| can be applied to the function 9 K : M m — > K, 9 K := 
f*(A*p) + g*(-p) + § ||p|| , with n > 0, which is K-strongly convex and differentiable 
with Lipschitz continuous gradient. In the light of Remark [2] the Lipschitz constant of 



V9 K is L(k) :-- 



WW 



Similar to the calculations made in Section [4 .2 1 we obtain for all k > 

25 



0(Pk 



0(p*) < - /.'"' r 



(0(0) - e(p*)) e 2v%y. 



Hence, when e > 0, in order to guarantee e-accuracy for the dual objective function 
we can force both terms in the above estimate to be less than or equal to |. Thus, by 
taking 



K(e) 



e 

w 



this time we will need to this end, in contrast to (23), 

1 L(k) , /25 (0(0) - %*)) 



i.e. k = O 



In 



k > 2\ 



iterations. 



In 



4e 



In analogy to the considerations made in Section 4.3 we obtain for all k > 
\\V0(p k )\\ < 2 v /l(k)(0(O) - 0(p*)) e~^v / ^W +2kR 
= 2JL(K)(9(0)-e(p*))e 



' 2 y/l^f 



> + 



2c 



R 



O 



Therefore, in order to guarantee \\V6(pk 

which coincide with the convergence rate for the dual objective values. 



< ^, we need k 



iterations, 



4.4.2 The case g is everywhere differentiable with Lipschitz continuous gra- 
dient 

Assuming additionally to the standing assumptions that the function g : IR m — > R has 
full domain and it is differentiable with ^-Lipschitz continuous gradient, for k > 0, the 
second smoothing, as done in Subsection |3.2| can be omitted. The fast gradient method 
Q can be applied to the function 9 P : R m — > R, 9 P := f*(A*p) + g*(—p), which is k- 
strongly convex due to [I] Theorem 18.15] and differentiable with Lipschitz continuous 
gradient. The Lipschitz constant of V0 P is L(p) := — h i. 
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The algorithm ^ applied to 9 P states 



UPk) - p (p*ds) < [ P (0) - 9 p (p* DS ) + g \\0-P*ds\ 



< 2 (0 P (0) - p (p* DS )) e V Hp) VA: > 0. 



i(p) 



Since 9 p (0) < 0(0) and 9 p {p* DS ) > 0(p^ 5 ) - > 6(p*) - pD f , we obtain 

9 p (0)-e p (p* DS )<0(0)-9(p*) + pD f . 



On the other hand, since 9 p (pk) — 9 p (p* DS ) — ^(Pfc) ~~ P-^/ ~~ ^ (?**); ^ follows 



+ 2 (0(0) -6>(p*) + p£>/)e 



-fc„ 



^ VJfe > 0. 



(29) 



Hence, when e > 0, in order to guarantee e-optimality for the dual objective, we force 
both terms in the above estimate less than or equal to |. By taking 



P ■= P( e ) 



2D, 



in contrast to (23), we need 



k > 



'£(p) / 4(0(O)-fl(p*) + f) 



(30) 



i.e. k = O y^ln (j^jj iterations. 
We obtain as well 



\V9 P {p k ) 



2JL(p)(6 p (0)-9 p ( P * DS ))e 



2JL(p)(9(0)-9(p*)+pD f )e 2 V Hp) 



)l 2j L(j>)(0(0) - 9{p*) + |) e "V^ \/k > 0. 



Therefore, in order to guarantee ||V0(pfc)|| < ^f, we need k = O y~^^ n iterations 
which is the same convergence rate as for the dual objective values. 



4.4.3 The case / is strongly convex and g is everywhere differentiable with 
Lipschitz continuous gradient 

Assuming additionally to the standing assumptions that the function / : % — > R is 
p-strongly convex, for p > 0, and the function g : M. m — > R has full domain and it is 
differentiable with ^-Lipschitz continuous gradient, for k > 0, both the first and second 
smoothing can be omitted. The fast gradient method d8| can be applied to the function 
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Q := f*{A*p) + g*(—p), which is K-strongly convex and differentiable with 



Lipschitz continuous gradient. The Lipschitz constant of V# is L := "—^ — h 



The fast gradient scheme ([8| applied to 9 yields for all k > 

0{p k ) ~ 0(p*) f ((9(0) - 0(p*) + \ || - p* f) e"Vz f 1 2(0(0) - 0(p*)) e" 
and, from here, when e > 0, 

2(0(0) - W) .-Vf < « « * > Jk ta W - »(»*» 

V AC 



On the other hand, formula ([TTJ states ||V0(p fc )|| < 2-/L(0(O) - 0(p*))e"2Vi for all 
k > 0, thus 

L. /2^(0)-e(p*))\ 



2a/l(0(O) - 0(p*)) e"^^ <£«>Ij> 2a/— In 

y K v t / 

In conclusion, in order to guarantee e-accuracy for the dual objective values and for the 
decrease of ||V0(-)|| to 0, we need O (hi (hj^j iterations. 

4.5 Constructing an approximate primal solution 

In the remaining of this section we work in the setting of our initial standing assumptions 
and show, first of all, how to recover approximately optimal solutions for the primal 
(P) from the sequence of approximately dual solutions (p k ) k >o- This will be followed 
by a convergence analysis for the approximate primal optimal solutions. One can easily 
notice that the investigations made here remain valuable when working in the special 
settings of the previous section, too. 

Since our main focus is to solve the primal optimization problem (P), we prove as 
follows that the sequences {xf lPk )k>o Q dom/ and (x gtPk ) k >o Q domg constructed in 
Subsection |3.1| contain all the information one needs to recover approximately optimal 
solutions to (P). 



Since e p {p k ) - 9{p*) K 9{p k ) - 9{p*) < e and 



UPk) - 0(P*) * 0(Pfc) " pDf ~ 0(P*) ^ 9( Pk )-9(p*\ - e - > -~ 

>o 

it holds \0pipk) —Q(p*)\ < e f° r all ft > 0. Further, for p k £ M m we have 
P {Pk) = f;(A* Pk ) + g*(- Pk ) 

= ( Pk ,Ax fiPk ) - f{x Lpk ) - P - ||x /ip j| 2 - (pk,X g>pk ) -g(Xg, ph ) 

and from here (notice that —v{D) = 9{p*)) 

f(x f , Pk ) + g(x g , Pk ) - v(D) = ( Pk ,V9 p ( Pk )) + (9(p*) - 9 p (p k )) - P - \\x ftPk \\ 2 Vft > 0. 
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It follows 

\f^f, Pk ) + 9(x g , Pk )-v(D)\ < 

< 
J22I 



\vo P (Pk)\\ + \e(p*)-o P (Pk)\ + ^\\xf >Pk f 

\Ve p { Pk )\\+e + pD f 
'*|| \\Ve p (p k )\\ +2e S - IbfcH + 2e Vfc > 0. 



Further, \\pk\\ can be estimated above using 



\Pk\ 



IPfc -Pjbsll < bfc - Pes II + \\p*ds\\ S 2 UpbsII S 2V2i?, 



therefore, we obtain 



;(£>)] < 4\/2e + 2e = 2(2\/2 + l)e Vk > 0. 



(31) 



By taking into account weak duality, i.e. v(D) < v(P), we conclude that Xf :Pk £ dom/ 
and Xg )Pk £ dom g can be seen as approximately optimal solutions to (P) when k is high 



enough to satisfy (28). 



4.6 Existence of an optimal solution 

This section is devoted to the convergence analysis of our primal sequences when e 
converges to zero. To this end let (e n ) n >o Q M + be a decreasing sequence of positive 
scalars with lim n _-. 00 e n = 0. For each n > 0, the double smoothing algorithm (|8|) with 



smoothing parameters p tn and n tn given by (22) requires at least k = k{e n ) iterations 



to fulfill ([28]). For n > we denote 

x n := Xf,p k(en) G dom / and y n 



x 



G dom g. 



Due to the boundedness of dom/, its closure cl(dom/) is weakly compact (see [I] 
Theorem 3.3]) and there exists a subsequence (x ni )i>o and x £ T~L such that x ni weakly 
converges to x £ cl(dom /) when / — > +00. Since A : % — > W 71 is linear and continuous, 



the sequence Ax ni will converge to Ax when / — > +00. In view of relation (28) we get 



< 



Ax 



Vn,. 



< 



2c 



"I 



R 



VI > 0. 



(32) 



This means that the sequence (y ni )i>o C dom g is obviously bounded, hence there exists 
a subsequence of it (still denoted by (y ni )i>o) and an element y £ cl(dom g) such that 
Vni ~^ V when / — > +00. Taking / — > +00 in (32 ) it follows Ax = y. Furthermore, due 
to (31 ), we have 



f(x ni ) + g(y ni )<v(D) + 2(3V2+l)e ni V/ > 
and, by using the lower semicontinuity of / and g and [l] Theorem 9.1], we obtain 
f(x)+g(Ax) < limixd{f(x ni ) + g(y ni )\ 

< lim {v(D) + 2(3V2 + l)e n ,\ = v(D) <v(P). 

Since v(P) £ M, we have x £ dom / and Ax £ dom g, which yields that x is an optimal 
solution to (P). 



15 



5 Two examples in image processing 



In this section we are solving a linear inverse problem which arises in the field of signal 
and image processing via the double smoothing algorithm developed in this paper. For a 
given matrix A £ M nxn describing a blur operator and a given vector b £ R n representing 
the blurred and noisy image the task is to estimate the unknown original image x* £ W 1 
fulfilling 

Ax = b. 

To this end we make use of two regularization functionals with different properties. 



5.1 An l\ regularization problem 

We start by solving the regularized convex optimization problem 
(P) inf {px-fcf + AIMIi}, 

where S C W 1 is an re-dimensional cube representing the range of the pixels and A > 
the regularization parameter. The problem to be solved can be equivalently written as 

(P) M {f(x) + g(Ax)}, 

for / : W 1 -> 1, f{x) = X\\x\\ x + S s (x) and g : R n -)• R, g{y) = \\y - b\\ 2 . Thus 
/ is proper, convex and lower semicontinuous with bounded domain and g is a 2- 
strongly convex function with full domain, differentiable everywhere and with Lipschitz 
continuous gradient having as Lipschitz constant 2. This means that we are in the 



setting of Subsection 4.4.2 



By making use of gradient methods, both the iterative shrinkage-tresholding algo- 
rithm (ISTA) (see [9]) and its accelerated variant FISTA (see [2j[3]) solve the optimiza- 
tion problem (P) in O (^j and O (zt=) iterations, respectively, whereas the convergence 

rate of our method is O In 

Since each pixel furnishes a greyscale value which is between and 255, a natural 
choice for the convex set S would be the re-dimensional cube [0,255] n C R n . In order 
to reduce the Lipschitz constant which appears in the developed approach, we scale the 
pictures to which refer within this subsection such that each of their pixels ranges in 
the interval 0, tq . We concretely look at the 256 x 256 cameraman test image, which 
is part of the image processing toolbox in Matlab. The dimension of the vectorized 
and scaled cameraman test image is re = 256 2 = 65536. By making use of the Matlab 
functions imf ilter and f special, this image is blurred as follows: 



H=fspecial ( ' gaussian ' , 9 ,4) ; % gaussian blur of size 9 times 9 

% and standard deviation 4 
B=imfilter (X,H, ' conv ' , ' symmetric ' ) ; % B=observed blurred image 

% X=original image 



In row 1 the function f special returns a rotationally symmetric Gaussian lowpass filter 
of size 9x9 with standard deviation 4. The entries of H are nonnegative and their sum 
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adds up to 1. In row 3 the function imfilter convolves the filter H with the image 
X £ R 256x256 and outputs the blurred image B £ ]g256x256 _ The boundary option 
"symmetric" corresponds to reflexive boundary conditions. 



Thanks to the rotationally symmetric filter H, the linear operator A £ 



given 



by the Matlab function imfilter is symmetric, too. By making use of the real spectral 
decomposition of A, it shows that ||^4|| 2 = 1. After adding a zero-mean white Gaussian 
noise with standard deviation 10 -4 , we obtain the blurred and noisy image b £ W 1 



which is shown in Figure 5.1 



original 



blurred and noisy 





Figure 5.1: The 256 x 256 cameraman test image 
The dual optimization problem in minimization form is 
(£>) - m±{f*(A*p) + g*(-p)} 



and, due to the fact that g has full domain, strong duality for (P) and {D) holds, i. e. 
v(P) = v(D) and (_D) has an optimal solution (see, for instance, |5jp))- By taking into 



consideration (30), the smoothing parameter is taken as 

e 

P ■-- 



2D f 



(33) 



for Df = sup ■ 



: x £ 



0, i \ = 327.68, while the e = 0.3 

and the regularization parameter is set to A = 2e-6. 

We show next that the sequences of approximate primal solutions (x/, P) .)fc>o and 
(x g ,p k )k>o can be easily calculated. Indeed, for k > we have 



x f,Pk 



argmin < A ||a;|| j_ H — 

X6[0 i]" I 2 



A* Pk 



arg min < 



A bjl + 



2 V p 

and, in order to determine it, we need to solve the one-dimensional convex optimization 
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problem 



inf _ I Xxi + ( — Pk ^ 1 - X{ i ( . 



for i = 1, ... ,n, which has as unique optimal solution T^q j_j Q {{A*pk)i — A)J. Thus, 

On the other hand, for all k > we have 

argmin{(p fc ,x) + g(x)} = argmin \(p k , x) + ||x - 6|| 2 } = b- -p k . 



u g,Pk 



ISTA 5o = 



1.314269e-02 



FISTA. = 7.096089e-03 



DS = 8.0501 51 e-03 




ISTA 1M = 9.689755e-03 



FISTA 1Q0 = 6.633611 e-03 



DS 10Q = 6.755323e-03 




Figure 5.2: Iterations of ISTA, FISTA and double smoothing (DS) for solving (P) 



Figure 5.2 shows the iterations 50 and 100 of ISTA, FISTA and the double smoothing 
(DS) approach. The objective function values at iteration k are denoted by ISTA&, 
FISTAfc and, respectively, DS& (e.g. DS^ := f(xf >Pk ) + g(Axf tPk )). All in all, the visual 
quality of the restored cameraman image after 100 iterations, when using FISTA or DS, 
is quite comparable, whereas the recovered image by ISTA is still blurry. However, a 
valuable tool for measuring the quality of these images is the so-called improvement in 
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signal-to-noise ratio (ISNR), which is defined as 



ISNR(fc) = 10 log 



10 



\X Xje | 



where x, b and Xk denote the original, observed and estimated image at iteration k, 
respectively. Figure [5T3] shows the evolution of the ISNR values when using DS, FISTA 
and ISTA to solve (P). 




Figure 5.3: Improvement in signal-to-noise ratio (ISNR) 



5.2 An I2 — h regularization problem 

The second convex optimization problem we solve is 

(P) M{\\Ax-bf + X(\\xf + 11x11!)}, 
where S C M. n is the n-dimensional cube [0, l] n representing the pixel range, A > 

1 1 1 1 2 

the regularization parameter and ||-|| — |— 1 1 * 1 1 ^ the regularization functional, already used 
in [7]. The problem to be solved can be equivalently written as 

(P) inf {f(x) + g(Ax)}, 

for / : R n ->■ 1, f(x) = \(\\xf + \\x\lj) + S s (x) and g : R n E, g(y) = \\y - b\\ 2 . Thus 
/ is proper, 2A-strongly convex and lower semicontinuous with bounded domain and 
g is a 2-strongly convex function with full domain, differentiable everywhere and with 
Lipschitz continuous gradient having as Lipschitz constant 2. This time we are in the 
setting of the Subsection |4.4.3 the Lipschitz constant of the gradient of 9 : M n — >• M, 
0(p) = f*(A*p)+g*(—p), being L = 2^ + 5- By applying the double smoothing approach 

one obtains a rate of convergence of O (in for solving (P). 



In this example we take a look at the blobs test image shown in Figure 5.4 which 



is also part of the image processing toolbox in Matlab. The picture undergoes the 
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original 



blurred and noisy 





Figure 5.4: The 272 x 329 blobs test image 



same blur as described in the previous section. Since our pixel range has changed, we 
now use additive zero-mean white Gaussian noise with standard deviation 10~ 3 and the 
regularization parameter is changed to A = 2e-5. 

We calculate next the sequences of approximate primal solutions (xf,p k )k>o an d 
( x g,Pk)k>o- Indeed, for k > we have 



x f, 



argmin I A ||x|| 2 + A \\x\\i — {A*p k ,x) \ 



argmin < V -(A pk)iXi + Ax^ + Xxi 
i=l,...,n [ i=1 
Xi<E[0,l] 



V 



[0,l] r ' 



2A 



(A* Pk - XV 



and 



x g , Pk = argmin {(pk,x) + g(x)} = argmin \(p k , x ) + \\x - b\\ 2 \ 

lei" zeR ?i L ' 



b-- Pk . 



Figure [53] shows the iterations 50 and 100 of ISTA, FISTA and the double smoothing 
(DS) technique together with the corresponding function values denoted by ISTA^, 
FISTAfe or DSfc. As before, the function values of FISTA are slightly lower than those 
of DS, while ISTA is far behind these methods, not only from theoretical point of view, 
but also as it can be detected visually. Figure [576] displays the improvement in signal- 
to-noise ration for ISTA, FISTA and DS and it shows that DS outperforms the other 
two methods from the point of view of the quality of the reconstruction. 



6 Conclusions 

In this article we investigate the possibilities of accelerating the double smoothing tech- 
nique when solving unconstrained nondifferentiable convex optimization problems. This 
method, which assumes the minimization of the doubly regularized Fenchel dual objec- 
tive, allows in the most general case to reconstruct an approximately optimal primal 
solution in O ( ^ In (7)) iterations. We show that under some appropriate assumptions 
for the functions involved in the formulation of the problem to be solved this convergence 
rate can be improved to O (^-^ In f^d); or even to O (in 
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131/^ = 7.1629976+00 



FISTA^ = 7.4742426-01 



DS C „ = 8.0082836-01 




Figure 5.5: Iterations of ISTA, FISTA and double smoothing (DS) for solving (P) 
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